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ABSTRACT 

We propose a Gaussian mixture model for background subtraction in infrared imagery. Following a Bayesian approach, our 
method automatically estimates the number of Gaussian components as well as their parameters, while simultaneously it avoids 
over/under fitting. The equations for estimating model parameters are analytically derived and thus our method does not require 
any sampling algorithm that is computationally and memory inefficient. The pixel density estimate is followed by an efficient 
and highly accurate updating mechanism, which permits our system to be automatically adapted to dynamically changing 
operation conditions. Experimental results and comparisons with other methods show that our method outperforms, in terms of 
precision and recall, while at the same time it keeps computational cost suitable for real-time applications. 


1. INTRODUCTION 


Pixels values of infrared frames correspond to the relative differences in the amount of thermal energy emitted or reflected from 
objects in the scene. Due to this fact, infrared cameras are equally applicable for both day and night scenarios, while at the 
same time, compared to visual-optical cameras, are less affected by changing illumination or background texture. Furthermore, 
infrared imagery eliminates any privacy issues as people being depicted in the scene can not be identified (U These features 
make infrared cameras prime candidate for persistent video surveillance systems. 

Although, infrared imagery can alleviate several problems associated with visual-optical videos, it has its own unique chal¬ 
lenges such as a) low signal-to-noise ratio (noisy data) and b) almost continuous pixel values that model objects’ temperature. 
Both issues complicate pixel responses modeling. An example of raw thermal responses is presented in Figure [I] where pixel 
values are floating point numbers ranging from 293 to 299 Kelvin degrees. Due to this peculiarity most of conventional com¬ 
puter vision techniques, that successfully used for visual-optical data, can not be applied straightforward on thermal imagery. 

For many high-level vision based applications, either they use visual-optical videos |4|21|2 4| 25| or infrared data mum 
the task of background subtraction constitutes a key component, as this is one of the most common methods for locating moving 
objects, facilitating search space reduction and visual attention modeling. 

Background subtraction techniques applied on visual-optical videos model the color properties of depicted objects |3[|T5j 
and can be classified into three main categories JTO) : basic background modeling jl9|[28), statistical background modeling 
(TT|[27] and background estimation (20|23j . The most used methods are the statistical ones due to their robustness to critical 
situations. In order to statistically represent the background, a probability distribution is used to model the history of pixel 
values intensity over time. Towards this direction, the work of Stauffer and Grimson |22|, is one of the best known approaches. 









Thermal responses for 500 thermal frames 




Fig. 1. Thermal responses for three different points. In contrast to visual-optical videos, where pixels take integer values, 
thermal responses are floating point numbers, corresponding to objects’ temperature. 

It uses a Gaussian mixture model, with fixed number of components, for a per-pixel density estimate. Similar to this approach, 
Makantasis et al. in JT8) propose a Student-t mixture model for background modeling, taking advantage of Student-t distribution 
compactness and robustness to noise and outliers. The works of |29| and (30| extend the method of (22j by introducing a rule 
based on a user defined threshold to estimate the number of components. However, this rule is application dependent and not 
directly derived from the data. Haines and Xiang in p4| address this drawback by using a Dirichlet process mixture model. Due 
to the computational cost of their method, the authors propose a GPU implementation. All of the aforementioned techniques 
present the drawback that objects’ color properties are highly affected by scene illumination, making the same object to look 
completely different under different lighting or weather conditions. 

Although, thermal imagery can provide a challenging alternative for addressing the aforementioned difficulty, there exist 
few works for thermal data. The authors of 00 exploit contour saliency to extract foreground objects. Initially, they uti¬ 
lize a unimodal background modeling technique to detect regions of interest and then exploit the halo effect of thermal data 
for extracting foreground objects. However unimodal background modeling is not usually capable of capturing background 
dynamics. Baf et al. in fT0| present a fuzzy statistical method for background subtraction to incorporate uncertainty into the 
mixture of Gaussians. However, this method requires a predefined number of components making this approach to be appli¬ 
cation dependent. Elguebaly and Bouguila in propose a finite asymmetric generalized Gaussian mixture model for object 
detection. Again this method requires a predefined maximum number of components, presenting therefore limitations when 
this technique is applied on uncontrolled environments. Dai et al. in [3J propose a method for pedestrian detection and tracking 
using infrared imagery. This method consists of a background subtraction technique that exploits a two-layer representation 
(one for foreground and one for background) of infrared frame sequences. However, the assumption made is that the foreground 
is restricted to moving objects, a consideration which is not sufficient for dynamically changing environments. One way to han¬ 
dle the aforementioned difficulties is to introduce a background model, the parameters and the structure of which are directly 
estimated from the data, while at the same time it takes into account the specific properties of infrared imagery. 

1.1. Our contribution 

This work presents background modeling able to provide a per pixel density estimate, taking into account the special charac¬ 
teristics of infrared imagery, such as low signal-to-noise ratio. Our method exploits a Gaussian mixture model with unknown 
number of components. The advantage of such a model is that its own parameters and structure can be directly estimated from 
the data, allowing dynamic model adaptation to uncontrolled and changing environments. 

An important issue in the proposed Gaussian mixture modeling concerns learning the model parameters. In our method, this 
is addressed using a variational inference framework to correspond the functional structure of the model with real data distri- 
























butions obtained from the infrared images. Then, the Expectation-Maximization (EM) algorithm is adopted to fit the outcome 
of variational inference to real measurements. Updating procedures are incorporated to allow dynamic model adaptation to the 
forthcoming infrared data. Our updating method avoids of using heuristics by considering existing knowledge accumulated 
from previous data distributions and then it compensates this knowledge with current measurements. 

Our overall strategy exploits a Bayesian framework to estimate all the parameters of the mixture model and thus avoiding 
over/under fitting issues. To compensate computational challenges arising from the non a priori known nature of the mixture 
model, we utilize conjugate priors and thus we derive analytical equations for model estimation. In this way, we avoid the 
need of any sampling method, which are computationally and memory inefficient. This selection makes our system suitable for 
online and real-time applications. 

This paper is organized as follows: Section [2] formulates the Bayesian framework for mixture modeling. In Section [3] 
we present the procedure to analytically derive the solutions for estimating the distributions of model parameters. Section]?] 
describes the EM algorithm along with setting the priors and parameters initialization. Section [5] discusses the online updating 
mechanism and Section [6] presents the background subtraction task. In Section [7] experimental results are given and Section [5] 
concludes this work. 


2. GAUSSIAN MIXTURE MODELING 


In this section we formulate the Bayesian framework adopted in this paper to analytically estimate all the parameters of the 


proposed Gaussian mixture model. For this reason, in section 2.1 we briefly describe the basic theory behind Gaussian mix¬ 
ture model, while in section |2.2| we describe the introduction of conjugate priors that assist us in yielding analytical model 
estimations as in Section [3] 


2.1. Model fundamentals 

The Gaussian mixture distribution can be seen as a linear superposition of Gaussian functional components, 

K 

p(x\vj,n,r) ='^2m k J\f(x\fi k ,T^ 1 ) ( 1 ) 

k=i 

where the parameters {w^}^ =1 must satisfy 0 < < 1 for every k and Ylk =i = 1 and K is the number of Gaussian 

components. In the proposed mixture modeling, variable K can take any natural value up to infinity. However, it is highly 
recommended to set the upper bound for K less than the cardinality of the dataset, i.e. the number of observed samples. By 
introducing a iT-dimensional binary latent variable z, such as J2k=i z k = ^ and p(zi c = 1) = Wk, the distribution p(x) can be 
defined in terms of a marginal distribution p(z) and a conditional distribution p{pc\z) as follows 

p(x |w, t) = 'Y^p(z\-^)p{x\z, n , r) (2) 

Z 

where p(z \vj) and and p(x\z) are in the form of 
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p(x\z,h,t) = n V(x| p k ,Tp) Zk 

k =1 


( 4 ) 



where /x = {nk}k=i an d r = { T k}%= v correspond to the mean values and precisions of Gaussian components. By introducing 
latent variables and transforming the Gaussian mixture distribution into the form of ([2]), we are able to exploit the EM algorithm 
for fitting our model to the observed data, as shown in Section [4] 

If we have in our disposal a set X = {aq, of observed data we will also have a set Z = {z i, z^} of latent 

variables. Each z n will be a iT-dimensional binary vector, such as Ylk=i z nk = 1, and, in order to take into consideration the 
whole dataset of N samples, the distributions of ([3]) and ([4]) will be transformed to 
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p(z m= n n w k nk (5) 

n=1k=1 

N K 

p(X\Z , = P[ II N( x n\Pk, Tk 1 ) Znk (6) 

n=1k=1 


2.2. Conjugate priors 

To avoid computational problems in estimating the parameters and the structure of the proposed Gaussian model, we introduce 
conjugate priors, over the model parameters /x, r and vj, that allow us to yield analytical solutions. This way the need of using 
sampling methods is prevented. Introduction of priors implies the use of a Bayesian framework for the analysis. 

Let us denote as T = {Z, vj, /x, r} the set which contains all model latent variables and parameters and as q(Y) its 
distribution. Then, our goal is to estimate q(Y) which maximizes model evidence p(X). 


q(Y ) : maxlnp(X) 


(7) 


where in (|7| we used the logarithm of p(X) for calculus purposes. For maximizing 0 we need to define the distribution over 
Y, that is, p(Z\vj) from ([5]), p(w) and p(/x, r). 

Due to the fact that p(Z\vj) is a Multinomial distribution, its conjugate prior is a Dirichlet distribution over the mixing 
coefficients vj 
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where T(-) is the Gamma function. Parameter Ao has a physical interpretation; the smaller the value of this parameter is, the 
larger is the influence of the data rather than the prior on the posterior distribution p(Z\vj). In order to introduce uninformative 
priors and not prefer a specific component against the other, we choose to use a single parameter Ao for the Dirichlet distribution, 
instead of a vector with different values for each mixing coefficient. 

Similarly, the conjugate prior of 0 takes the form of a Gaussian-Gamma distribution, because both fi and r are unknown. 
Subsequently, the joint distribution of parameters /x and r can be modeled as 


p(h,t)=p(h\t)p(t) (9a) 

K 

= P[ Af(p,k\mo, (/3oT k )~ 1 )Gam(T k \a 0 ,bo) (9b) 

k =1 

where Gam(-) denotes the Gamma distribution. 

In order to not express any specific preference about the form of the Gaussian components through the introduction of 
priors, we use uninformative priors by setting the values of hyperparameters mo, A), ao and bo to appropriate values as shown 
in Section 01 



Having defined the parametric form of observed data, latent variables and parameters distributions, our goal is to approx¬ 
imate the posterior distribution p(Y\X) and the model evidence p(X ), where Y = {Z, p,, r} is the set with distribution 

q(Y ), which contains all model latent variables and parameters. Based on the Bayes rule, which states that p(X)p{Y\X) = 
p(X, 1^) the logarithm of distribution p(X) can be expressed as 
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where KL(q\ \ p) is the Kullback-Leibler divergence between q(Y) and p(Y\X) distributions and C(q) is the lower bound of 
lnp(X). Since KL(q\ \ p) is a non negative quantity, equals to zero only if q(Y) is equal to p(Y\X), maximization of In p(X) 
is equivalent to minimizing of KL(q\\p). For minimizing KL(q\\p) and estimating p(X) we exploit the EM algorithm, as 
shown in Section [4] 

By making the assumption, based on variational inference, that the distribution q(Y) can be factorized over M disjoint sets 
such as q(Y) = fl^ii as shown in Q, the optimal solution q*{Yj) corresponds to the minimization of KL(q\\p) is 

given by 

Inq^Yj) =E i#i pnp(X,y)]+C (11) 

where E^ [In p(X, Y")] is the expectation of the joint distribution over all variables Yj for j ^ i and C is a constant. P(X, Y) 
is modeled through ( [T2| ). 

In the following, we present the analytical solution for the optimal distributions q*{Yj) for model parameters and latent 
variables, i.e. the optimal distributions q*(Z), q*(vj), q*(r) and g*(/x|r). 


3. OPTIMAL MODEL PARAMETER DISTRIBUTIONS 

According to ([5]), ([6]), ([8} and ([9]), the joint distribution of all random variables can be factorized as follows 

P(X , Z, xu, r) =p{X\Z, /x, r)p(Z\xu) 
p{tu)p(h\t)p{t) 

X corresponds to the set of the observed variables. All proofs are given in Appendix |A[ 


3.1. Optimal q*{Z) distribution 


Using (11) and the factorized form of (12) the distribution of the optimized factor g* (Z) is given by a Multinomial distribution 
of the form 
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as p n k we have denote the quantity 


pnk = exp H- ^E[ln r k \ - ^ ln27r- 

\(%n l^k) 7/c] ^ 


(14) 


The expected value E[z n fc] of q*(Z) is equal to r n /~. 


3.2. Optimal q* M distribution 

Using (12} and (11) the distribution of the optimized factor q*(vj) is given a Dirichlet distribution of the form 


q*{zu) = 


r(E^i M tt _Afc—1 


n 


TV' II b. 


(15) 


is equal to A^ + Aq, where A^ = J2n =1 r n& represents the proportion of data that belong to the &-th component. 


3.3. Optimal q*{pk\^k) distribution 

Similarly, the distribution of the optimized factor q^(pk^k) is given by a Gaussian distribution of the form 

q*(l^k\rk) = A/'(/ife|m ft ,(/3 fe r) _1 ) 

where the parameters rrik and ftk are given by 


h — P 0 + Nk 
m k = (p 0 m 0 + N k xPj 


where x k is equal to E n =i r nk x n represents the centroid of the data that belong to the /, -lh component. 


(16) 
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(17b) 


3.4. Optimal q*(r k ) distribution 

After the estimation of q* (/i k \r k ), distribution of the optimized factor q* (r k ) is given by a Gamma distribution of the following 
form 

q*(r k ) = Gam(T k \a k ,b k ) (18) 

while the parameters a k and b k are given by the following relations 

N k 

a k = a 0 + — (19a) 

b k = b 0 + 1 N k a k + ( x k - w 0 ) 2 ^ (19b) 

where a k = ^ E«= i( x n ~ x k ) 2 . 





4. DISTRIBUTION PARAMETERS OPTIMIZATION 


After the approximation of random variables distributions, we will use the EM algorithm in order to find optimal values for 
model parameters, i.e. maximize In order to use the EM algorithm, we have to initialize priors hyperparameters Ao, ao, 
bo, mo and po and the model parameters w k , Hk,Tk, Pk, a k , h c and \ k (see Section[3|. 

The parameter Ao can be interpreted as the effective prior number of observations associated with each component. In 
order to introduce an uninformative prior for vj, we set the parameter Ao equal to N/K, suggesting that the same number of 
observations is associated to each component. Parameters ao and bo (positive values due to Gamma distribution) were set to the 
value of 10 -3 . Our choice is justified by the fact that the results of updating equations (19a) and (19b) are primarily affected by 
the data and not by the prior when the values for ao and bo are close to zero. The mean values of the components are described 
by conditional Normal distribution with means mo and precisions PoT k . We introduce an uninformative prior by setting the 
value for mo to the mean of the observed data and the parameter Po = , where vo is the variance of the observed data. 

The convergence of EM algorithm is facilitated by initializing the parameters w k , fi k , r k and p k using the k-means. To 
utilize k-means, the number of clusters, corresponding to the Gaussian components, should be a priori known. Since we create a 
mixture model, the number of Gaussian components should be less or equal to the number of observed data. For this reason we 
set the number of clusters iT ma;r to a value smaller or equal to the number of observations. If we have no clue about the number 
of classes we can set K max to equal N. If we denote as TV k the number of observation that belong to fc-th cluster, then we can 
set the value of parameter to equal the centroid of fc-th cluster, the parameter to equal the proportion of observations for 
the fc-th cluster, the parameter r k to equal v^ 1 , where v k stands for the variance of the data of the fc-th cluster and the parameter 
Pk to equal N^ 1 . Having initialized the parameters w k , fi k , and p k , we can exploit the formula for the expected value of 
a Gamma distribution to initialize the parameters a k and b k to values r k and one respectively. Finally, the initialization of w k 
allows us to initialize the parameter X k , which can be interpreted as the effective number of observations associated with each 
Gaussian component, to the value Nw k . 

After the initialization of model parameters and priors hyperparameters, the EM algorithm can be used to minimize 
KL(q\\p) of (10). During the E step, r nk is calculated given the initial/current values of all the parameters of the model. 
Using ( |13b| ) r nk is given by 


r n k oc KJkTk ' exp 


1/2 PY1 J »k (r „ U .) 2 -~ 


26fe v 2/3 fe , 

Due to the fact that q* (vj) is a Dirichlet distribution and q* (r k ) is a Gamma distribution, w k and f k will be given by 


lnzu k = E[lnc7 fe ] = d>(A fc ) - d>( ^ A, 
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lnffc = E[lnr fe ] = 'I >( a k) ~ In&fe 


( 20 ) 
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where \£(-) is the digamma function. 

During the M step, we keep fixed the value for variables r nk (the value that was calculated during the E step), and we 
re-calculate the values for model parameters using (15), <[17) and ([19). The steps E and M are repeated sequentially untill the 
values for model parameters are not changing anymore. As shown in [^) convergence of EM algorithm is guaranteed because 
bound is convex with respect to each of the factors q(Z), q(vj), q(fi\r) and q(r). 

During model training the mixing coefficient for some of the components takes value very close to zero. Components with 
mixing coefficient less than 1/N are removed (we require each component to model at least one observed sample) and thus 
after training the model has automatically determined the right number of Gaussian components. 







5. ONLINE UPDATING MECHANISM 


Having described how our model fits to N observed data, in this section we present the mechanism that permits our model to 
automatically adapt to new observed data. We use no heuristic rules but statistics based on the observed data. 

Let us denote as x new a new observed sample. Then, there are two cases; either the new observed sample is successfully 
modeled by our trained model, or not. To estimate if a new sample is successfully modeled, we find the closest component to 
the new sample. As a distance metric between components and the new sample, we use the Mahalanobis distance, since this is 
reliable distance measure between a point and a distribution. 

The closest component c to the new sample is the one that presents the minimum Mahalanobis distance D k 

c = arg min D k = arg min \/[x new - p k ) 2 r k (22) 

k k 

The probability of the new sample to belong to c 

P(%new\Pa Zq) = Af (x new \Pc-> T c ) (23) 


where /i c and r c stand for the closest component mean value and precision respectively. 

Let us denote as Q the initially observed dataset. Then, we can assume that the probability to observe the new sample x new 
is given by 

7V e 

p(%new\c) -^rZ^(x neu , \x new e^x new H - e) (24) 


where N e = | {xi E ft : x new — e < Xi < x new + e} \ andU(x new \x new — e, x new + e) is a Uniform distribution with lower 
and upper bounds to equal x new — e and x new + e respectively. Equation ( [24] ) suggests that the probability to observe x new is 
related to the proportion of data that have already been observed around x new . By increasing the neighborhood around x new , 
i.e. increasing the value of e, the quantity U(x new \x new — e , x new + e) is decreasing, while the value of N e is increasing. 

Upon arrival of a new sample x new , we can estimate the optimal range e around x new that maximizes ([24]) as 


e = arg m&xp(x new \e) 

e 


(25) 


Then, if p(x new |/i c , r c ) > p(x new |e) the new observed sample x new can sufficiently represented by our model. Otherwise, 
a new Gaussian component must be created. 

For model updating, we need to define the binary variable o, called ownership and associated with the Gaussian components, 


as 


Ok 


1, if k = c 

0, otherwise 


(26) 


where we recall that c represents the index of the closest component and k is the index of k -th component. 

When the new observed sample is successfully modeled, the parameters for the Gaussian components are updated using the 
following the leader [ 61 approach described as 
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Fig. 2. First row: updating of our model to new observed data. Second row: updating of model presented in J29) . 


where a\ is equal to r k 1 . 


When the new observed sample cannot be modeled by the existing components, a new component is created with mixing 
coefficient vu new , mean value n new and standard deviation a new , the parameters of which are given as 


^new 


1 

N 


l^new — %new 


(2e) 2 ~ 1 

12 


(28) 


From (28), we see that the mixing coefficient for the new component is equal to 1/N since it models only one sample (the 
new observed one), its mean value equals the value of the new sample and its variance the variance the Uniform distribution, 
whose the lower and upper bounds are x new — e and x new + e respectively. When a new component is created the values for the 
parameters for all the other components remain unchanged except mixing coefficients vj, which are normalized to sum 
After each adaptation of the system to new observed samples, either they modeled by the trained model or not, the mixing 
coefficients of the components are normalized to sum to one. 


Figure [2] presents the adaptation of our model (first row) and the model presented in (29| (second row) to new observed 
data. To evaluate the quality of the adaptation of the models, we used a toy dataset with 100 observations. Observed data were 
generated from two Normal distributions with mean values 16 and 50 and standard deviations 1.5 and 2.0 respectively. The 
initially trained models are presented in the left column. Then, we generated 25 new samples form a Normal distribution with 
mean value 21 and standard deviation 1.0. Our model creates a new component and successfully fits the data. On the contrary, 
the model of (29| is not able to capture the statistical relations of the new observations and fails to separate the data generated 
from distributions with mean values 16 and 21 (middle column). The quality of our adaptation mechanism becomes more clear 
in the right column, which presents the adaptation of both models after 50 new observations. 








































Algorithm 1: Overview of Background Subtraction 


1 

capture N frames 


2 

create 7V-length history for each pixel 


3 

initialize parameters (see Section [ij) 


4 

until convergence (training phase: Section ^ 


5 

compute r n k using |2oj) 

0 

6 

recompute parameters using (15), (17) and 

7 

for each new captured frame 

8 

classify each pixel as foreground or background 
(see Section 6b 

9: 

update background model (see Section p| 



6. BACKGROUND SUBTRACTION 

In this section we utilize our model for background subtraction. We initially capture N frames used to create an infrared 
responses history for each pixel. These histories act as observed data and used to train a model for each pixel. To classify 
a pixel of a new captured frame as background or foreground, we compute the probability its value to be represented by 
the mixture model. If this value is larger than a threshold the pixel is classified as background, otherwise it is classified as 
foreground. The threshold can be defined in relation to the parameters of the mixture, in order to be dynamically adapted. For 
example, we can define the threshold to be equal to fi c d= vg c where v is a scalar defining a confidence interval. The overview 
of the background subtraction algorithm is shown in Algorithm 1. 


7. EXPERIMENTAL RESULTS 

For evaluating our algorithm, we used the Ohio State University (OSU) thermal datasets and a dataset captured at Athens 
International Airport (AIA) during Evacuat^] European funding project. OSU datasets contain frames that have been captured 
using a thermal camera and have been converted to grayscale images. In contrast, the AIA dataset contains raw thermal frames 
whose pixel values correspond to the real temperature of objects. 

OSU datasets 00 are widely used for benchmarking algorithms for pedestrian detection and tracking in infrared imagery. 
Videos were captured under different illumination and weather conditions. AIA dataset was captured using a Flir A315 camera 
at different Airside Corridors and the Departure Level. Totally, 10 video sequences were captured, with frame dimensions 

Ihttp://www.evacuate.eu/ 



Fig. 3. Visual results for all datasets. 















Our approach 


MOG 


SBG 






OSU 1 

OSU 2 

AIA 

Our approach 

Precision 
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0.88 
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Recall 

Worst 

0.54 

0.65 

0.25 

Best 

0.96 

0.95 
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FI score 

Worst 

0.60 

0.74 

0.31 
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0.88 

0.94 
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MOG 

Precision 

Worst 

0.52 

0.92 

0.44 

Best 

0.99 

0.99 

0.84 

Recall 

Worst 

0.12 

0.46 

0.12 

Best 

0.39 

0.78 

0.21 

FI score 

Worst 

0.20 

0.62 

0.19 

Best 

0.56 

0.87 

0.33 

SBG 

Precision 

Worst 

0.41 

0.23 

0.42 

Best 

0.66 

0.75 

0.90 

Recall 

Worst 

0.56 

0.78 

0.02 

Best 

0.96 

0.99 

0.10 

FI score 

Worst 

0.53 

0.36 

0.05 

Best 

0.78 

0.59 

0.16 


(a) Precision, recall and FI score 


(b) Best/worst case for precision, recall and FI score 


Fig. 4. Algorithms performance per dataset. 


320 x 240 pixels of total duration 32051 frames, at 7.5fps, that is, about lh and 12mins. During experimentation process 
we used the sequence captured at the Departure Level Entrance 3 which provides a panoramic view of the space. The other 
sequences at corridors, due to narrow space perspective and the fact that videos were captured by mounting the camera at human 
height level, are inappropriate for testing background subtraction algorithms. 

We compared our method with the method presented by Zivkovic in (29]] (MOG), which is one of the most robust and widely 
used background subtraction technique, and with the method for extracting the regions of interest presented in (8j9) (SBG) used 
for thermal data. To conduct the comparison we utilized the objective metrics of recall , precision and FI score on a pixel wise 
manner. Figures [3] visually present the performance of the three methods. As is observed, our method outperforms both MOG 
and SBG on all datasets. While MOG and SBG perform satisfactory on grayscale frames of OSU datasets, their performance 
collapses when they applied on AIA dataset, which contains actual thermal responses. Regarding OSU datasets, MOG algorithm 
while presents high precision it yields very low recall values, i.e. the pixels that have been classified as foreground are indeed 
belong to the foreground class, but a lot of pixels that in fact belong to background have been misclassified. SBG algorithm 
seems to suffer by the opposite problem. Regarding AIA dataset, our method significantly outperforms both methods. In 
particular, while MOG and SBG algorithms present relative high precision, their recall values are under 0.2. Figure[4ja) presents 
average precision, recall and FI score per dataset and per algorithm for all frames examined to give an objective evaluation. In 
Figure |4|b) presents the best and worst case in terms of precision, recall and FI score among all frames examined. 

Regarding computational cost, the main load of our algorithm is in the implementation of EM optimization. In all exper¬ 
iments conducted, the EM optimization converges within 10 iterations. Practically, the time required to apply our method is 
similar to the time requirements of Zivkovic’s method making it suitable for real-time applications. 


8. CONCLUSIONS 

This paper presents a background subtraction method applicable to thermal imagery, based on Gaussian mixture modeling 
with unknown number. We analytically derive the solutions that describe the parameters of the model and we use the EM 
optimization to estimate their values, avoid sampling algorithms and high computational cost. Due to its low computational 
cost and the real-time operation, our method is suitable for real-world applications. 
















































A. APPENDIX 


Using (11) and (12) the logarithm of g* (Z) is given by 


In g*(Z) =E w [lnp(Z|«7)] + 

+ E M>T [lnp(X|Z,//,T)] + C 


substituting 0 and ([6]) into ([29]) we get 

In q*(Z) = ^2^2z nk ( E[lntu fe ] + lE[lnr fe ]- 

ra=1 fe=l '■ 

- 1 In 2n - MiT [(x n - M fc ) 2 r fc ] ^ + C : 


N K 


(29) 


Using (12) and (11) the logarithm of g* (zzr, fi , r) is 


ln5*(tS7,/i,r) = E z [\np(X\Z,iJ,,T)+ 

+ \n.p(Z\w)+ 

+ lnp(vj)+\np(p,,T)] +C = (31a) 

N K 

= EE In J\f (x n |/i/c 5 ) + 

n=l /c=l 

+ E z [lnp(Z|z^)] 

K 

+ \ap(vj) + y^lnp(/i fc ,r fe ) +C (31b) 

k=l 

Due to the fact that there is no term in ( |31b| ) that contains parameters from both sets {vj} and {//, r}, the distribution 
g*(zz7, fj, , r) can be factorized as q{vj, /j, , r) = g(zi7) n&Li g(Mfcj r fc)- The distribution for q*(vj) is derived using only those 
terms of ( |31b| ) that depend on the variable zzr. Therefore the logarithm of q(vj) is given by 


lng*(zi7) = E z [in p(Z\vj)] + In p{vj) + C = 

(32a) 

= f> W ( j pn= 1 r nk+ X 0 -1) +C = 

(32b) 

k=1 


= f> 4 Arfe+A °- 1) +c 

(32c) 

k =1 



We have made use of E[z n fc] = r n k, and we have denote as Nk = J2n=i r nk- 
distribution with hyperparameters A = {7Vfc + Ao}*L 1 . 


(32c) suggests that q*{w) is a Dirichlet 


Using only those terms of (31b) that depend on variables fi and r, the logarithm of g* (//&, Tfc) is given by 










In q*(fi k ,T k ) =\nAf(nk\m 0 ,(/3 0 T k ) 1 )+ 
+ In Gam(Tk\a 0 , &o)+ 


N 

+ ^2E[z nk ] InAf^Xnluk,^ 1 ) +C = 

n= 1 

- rn 0 ) 2 + ^ In(/3 0 r fc )+ 

+ ( a 0 - 1) InTfe - b 0 T k - 
N 


~ ^ ^2 E \- Znk \ ( Xn ~ ! J ’k) 2 Tk + 
n= 1 

i / N \ 

+ 2 ( X! E [^fe] ) ln (/ 3 o'Tfe) + c 

' n=1 ' 


For the estimation of g* (/!& |r^), we use (33) and keep only those factors that depend on fi^. 

In q*{Hk\Tk\. = - Too) 2 - 

N 


2 ^ [^nfe] (^n M/c) T~k — 
n=l 

= ~2 Mfc (A + -^/s) T fc+ 

+ Mfc'Tfc (/?orao + A^feSfeJ + C =>■ 
^(Mfckfc) = {Pk T )~ 1 ) 


where X^ =1 r nfe :r n , /3 k = f3 0 + N k and m k = 4-(/? 0 m 0 + A^x fe ). 


Pk 


(33) 


(34a) 


(34b) 

(34c) 


After the estimation of q*(n k \r k ), logarithm of the optimized the distribution q*(r k ) is given by 


In q*ij k ) = In q*(n k ,T k ) - In q* (fj, k \T k ) 

( _l Nk 

= I no + ~y ~ 1 j Inifc — 


- ^T fe ^/3 0 (/ife - m 0 ) 2 + 

N 

-f 2&o 4“ ^ ) ^nkij^n M/s) 

n=l 

flk (m/c m fe ) ^ + C 


g*(r/c) = Gam(rfe|afc,6fc) 


(35a) 


(35b) 


(35c) 





The parameters a& and bk are given by 


a k — a o H—^ 

h = bo + \ (■ Nkak + -£rk^ k mo)2 ) 


(36a) 

(36b) 


where a k = ^ T,n=i( x n ~ x k) 2 - 
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